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Abstract 

In  an  empirical  analysis  of  software  maintenance  projects  in  a  large  IBM  COBOL  transaction 
processing  environment  the  impacts  of  software  complexity  upon  project  costs  were  estimated. 
Program  size,  modularity,  and  the  use  of  branching  were  all  found  to  significandy  affect 
software  maintenance  costs.  It  was  estimated  that  projects  that  were  required  to  perform 
maintenance  on  systems  with  greater  underlying  code  complexity  cost  approximately  35% 
more  than  similar  projects  dealing  with  less  complex  code.  These  costs  amount  to  several 
million  dollars  a  year  at  this  site  alone.  A  generalizable  model  is  provided  to  allow  researchers 
and  managers  to  estimate  these  costs  in  other  environments. 


ACM  CR  Categories  and  Subject  Descriptors:   D.2.7  [Software  Engineering]:  Distribution  and 
Maintenance;  D.2.8   [Software   Engineering]:  Metrics;  D.2.9   [Software   Engineering]: 
Management;  F.2.3  [Analysis  of  Algorithms  and  Problem  Complexity]:  Tradeoffs  among 
Complexity  Measures;  K.6.0  [Management  of  Computing  and  Information  Systems]:  General  - 
Economics;  K.6.1  [Management  of  Computing  and  Information  Systems]:  Project  and  People 
Management;  K.6.3  [Management  of  Computing  and  Information  Systems]:  Software 
Management 

General  Terms:  Management,  Measurement,  Performance. 

Additional  Key  Words  and  Phrases:  Software  Productivity,  Software  Maintenance,  Software 
Complexity. 


I.  Introduction 

With  software  costs  now  exceeding  $200  billion  annually  and  with  most  of  that  being  spent  on 
maintenance  of  all  types,  rather  than  new  development,  the  economic  incentives  to  develop 
software  that  requires  less  repair  maintenance  and  is  more  easily  adapted  to  changing 
requirements  are  quite  strong  [Boehm,  1979]  [Boehm,  1987]  [Gallant,  1986].  In  this  paper, 
we  test  and  measure  the  degree  to  which  the  maintainability  of  a  system  is  influenced  by  the 
complexity  of  the  existing  code.  In  particular,  we  investigate  the  impact  of  software 
complexity  upon  the  productivity  of  software  maintainers. 

The  empirical  evidence  linking  software  complexity  to  software  maintenance  costs  has  been 
criticized  as  being  relatively  weak  [Kearney  et  al.,  1986].  Much  of  the  early  work  is  based 
upon  experiments  involving  small  programs  [Curtis,  Shepperd  and  Milliman,  1979]  or  is 
based  upon  analysis  of  programs  written  by  students  [Kafura  and  Reddy,  1987].  Such 
evidence  can  be  valuable,  but  several  researchers  have  noted  that  caution  must  be  used  in 
applying  these  results  to  the  actual  commercial  application  systems  which  account  for  most 
software  maintenance  expenditures  [Conte,  Dunsmore  and  Shen,  1986  p.  1 14]    [Gibson  and 
Senn,  1989].  And,  the  limited  field  research  that  has  been  done  has  generated  either  no  or 
conflicting  results;  as,  for  example,  in  the  case  of  degree  of  program  modularity  [Vessey  and 
Weber,  1983]  [Basili  and  Perricone,  1984]  [Card,  Page  and  McGarry,  1985],  and  in  the  case 
of  program  structure  (see  Vessey  and  Weber's  1984  review  article.).  Finally,  none  of  the 
previous  work  develops  estimates  of  the  actual  cost  of  complexity,  estimates  which  could  be 
used  by  software  maintenance  managers  to  make  best  use  of  their  resources.  Research 
supponing  the  statistical  significance  of  a  factor  is  a  necessary  first  step  in  this  process,  but 
practitioners  must  also  have  an  understanding  of  the  magnitudes  of  these  effects  if  they  are  to 
be  able  to  make  informed  decisions  regarding  their  control. 

This  study  analyzes  the  effects  of  software  complexity  upon  the  costs  of  COBOL  maintenance 
projects  within  a  large  commercial  bank.  Freedman  notes  that  60%  of  all  business 
expenditures  on  computing  are  for  maintenance  of  COBOL  programs,  and  that  there  are  over 
50  billion  lines  of  COBOL  in  existence  worldwide,  the  maintenance  of  which,  therefore, 
represents  an  information  systems  activity  of  considerable  economic  importance  [Freedman, 
1986].  Using  a  previously  developed  model  of  software  maintenance  productivity  [Banker, 


Datar  and  Kemerer,  1991]'  we  estimate  the  marginal  impact  of  software  complexity  upon  the 
costs  of  software  maintenance  projects  in  a  data  processing  environment  The  analysis 
confirms  that  software  maintenance  costs  are  significantly  affected  by  software  complexity, 
measured  in  three  dimensions:  module  size,  procedure  modularity,  and  control  structure 
complexity.  The  results  further  suggest  that  the  magnitudes  of  these  costs  are  such  that 
software  maintenance  managers  should  monitor  the  complexity  of  the  software  under  their 
control,  and  take  active  steps  to  reduce  that  complexity. 

This  research  makes  contributions  in  two  distinct  areas.  The  first  is  in  developing  a  model  with 
which  to  resolve  some  current  academic  debate  regarding  the  nature  of  the  impact  of  software 
complexity,  and  the  shape  of  the  functional  form  relating  complexity  to  the  productivity  of 
software  maintainers.  The  second  is  in  providing  practicing  software  maintenance  managers 
with  a  predictive  model  with  which  to  evaluate  the  future  effects  of  software  design  decisions. 
This  model  could  also  be  used  to  assist  in  the  cost-benefit  assessment  of  a  class  of  computer- 
aided  software  engineering  (CASE)  tools  known  as  restructurers. 

The  remainder  of  this  paper  is  organized  as  follows.  Section  II  outlines  the  research  questions, 
and  summarizes  previous  field  research  in  this  area.  Section  IH  describes  our  research 
approach  and  methodology,  and  section  FV  presents  our  model  and  results.  Implications  for 
practitioners  are  presented  in  section  V,  and  concluding  remarks  and  suggestions  for  future 
research  are  provided  in  the  final  section. 

II.  Research  Questions 

Complexity  and  maintenance 

The  complexity  of  a  software  system  is  said  to  increase  as  "the  number  of  control  constructs 
grows  and  as  the  size  in  the  number  of  modules  grows"  [Conte,  Dunsmore  and  Shen,  1986, 
p.  109].  The  formal  characterization  of  the  maintenance  impacts  of  software  complexity  is 
sometimes  ascribed  to  Belady  and  Lehman,  who,  in  their  Evolution  Dynamics  theory,  propose 
that  software  systems,  like  their  analogues  in  other  contexts,  face  increasing  entropy  over  time 
[Belady  and  Lehman,  1976].  As  more  changes  are  made  to  a  system  in  the  form  of 
maintenance  requests,  the  initial  design  integrity  deteriorates,  and  the  system's  complexity 
increases.  In  addition,  several  longitudinal  studies  have  noted  increases  in  the  size  of  software 


^Hereafter  referenced  as  "BDK,  1991". 


systems  that  are  in  active  use  [Lawrence,  1982]  [Chong  Hok  Yuen,  1987].  Both  of  these 
factors  have  been  suggested  to  contribute  to  the  increasing  difficulty  of  software  maintenance 
over  time.  Given  the  growing  economic  importance  of  maintenance,  researchers  have 
attempted  to  empirically  validate  these  theories.  In  general,  however,  researchers  have  not 
been  able  to  empirically  test  the  impact  of  complexity  upon  maintenance  effort  while  controlling 
for  other  factors  known  to  affect  costs.  Therefore,  our  overall  research  question  (to  be 
developed  into  specific  testable  hypotheses  below)  will  be: 

Research  question  1:  Controlling  for  other  factors  known  to  effect  software  maintenance 
project  costs,  wliat  is  the  impact  of  software  complexity  upon  the  productivity  of  software 
maintenance  projects? 

Size  and  Modularity 


A  key  component  of  structured  programming  approaches  is  modularity,  defined  by  Conte,  et 

al.,  as  "the  programming  technique  of  constructing  software  as  several  discrete  parts"  [1986, 

p.  197].  Freedman  and  Weinberg  have  estimated  that  75-80%  of  existing  software  was 

produced  prior  to  significant  use  of  structured  programming  [Schneidewind,  1987],  and 

therefore  the  absence  of  modularity  is  likely  to  be  a  significant  practical  problem.  A  number  of 

researchers  have  attempted  to  empirically  validate  the  impact  of  modularity  on  either  software 

quality  or  productivity  with  data  from  actual  systems,  and  the  results  of  this  research  are 

summarized  in  Table  1. 

Table  1:  Previous  Field  research  on  Modularity 


Y?ar 

Researchers 

Dm 

Dependent 
Variable 

Conclusions^ 

1983 

Vessev  &  Weber 

COBOL 

#  of  Repairs 

Unidirectional  +,0 

1984 

Basih  & 
Perricone 

Fortran 

Errors/KSLOC 

Unidirectional  - 

1984 

Bowen 

Algol,  CMS,  etc 

McCabe, 
Halstead  metrics 

Suggests  2-way 
relationship 

1984 

Boydston 

Assembler,  PLS 

Effort 

Suggests  2-way 
relationship 

1985 

Shen,  et  al. 

Pascal,  PLS  etc 

Problem  reports 

Suggests  2-way 
relationship 

1985 

Card,  et  al. 

Fortran 

Effon 

Unidirectional  -,0 

1987 

An,  et  al. 

C 

Change  data 

Unidirectional,  -? 

1989 

Lind  &  Vairavan 

Pascal,  Fortran 

Normaliz^i 
change  data 

Suggests  2-way 
relationship 

^For  unidirectional  tests, "+"  indicates  that  greater  modularity  (more,  smaller  modules)  improved  performance, 
"-"  indicates  that  less  modularity  (fewer,  larger  modules)  improved  performance,  and  "0"  indicates  mixed  or  no 
results.  A  2-way  relationship  is  one  in  which  both  positive  and  negative  deviations  from  optimal  module  size 
reduce  performance. 


Perhaps  the  first  widely  disseminated  field  research  in  this  area  was  by  Vessey  and  Weber,  in 
their  study  of  repair  maintenance  in  Australian  and  US  data  processing  organizations  [Vessey 
and  Weber,  1983].  Their  work  relied  on  subjective  assessments  of  the  degree  of  modularity  in 
a  large  number  of  COBOL  systems.  In  one  dataset  they  found  that  more  modular  code  was 
associated  with  fewer  repairs,  in  the  other  dataset  no  effect  was  found.  Basili  and  Perricone,  in 
an  analysis  of  a  large  Fortran  system,  found  more  errors  per  thousand  source  lines  of  code 
(KSLOC)  in  smaller  modules,  which  they  hypothetically  attributed  to  a)  greater  numbers  of 
interface  errors,  b)  possible  greater  care  taken  in  coding  larger  modules,  or  c)  simply  the 
continued  presence  of  undiscovered  errors  in  larger  modules  [Basili  and  Perricone,  1984]. 
Shen,  et  al.  disagreed  with  Basili  and  Perricone's  analysis,  noting  that  the  higher  error  rate 
observed  with  smaller  modules  could  be  simply  a  function  of  an  empirically  observed 
phenomenon  that  modules  contain  a  number  of  errors  independent  of  size,  in  addition  to  a 
size-related  error  rate.  Therefore,  according  to  this  model,  smaller  modules  will  show  a  higher 
rate  of  errors  due  to  this  size-independent  error  component  being  divided  by  a  smaller  number 
of  lines  of  code.  Shen,  et  al.  conclude  that  "...it  may  be  beneficial  to  promote  programming 
practices  related  to  modularization  that  discourage  the  development  of  either  extremely  large  of 
extremely  small  modules."  [Shen  era/.,  1985]  p.  323]. 

Bowen,  in  an  analysis  of  secondary  data,  compared  the  number  of  SLOC  /  module  with  a  set 
of  assumed  maximum  values  of  two  well-known  complexity  metrics,  McCabe's  V(G)  and 
Halstead's  N  [Bowen,  1984].  He  concluded  that  the  optimal  values  of  SLOC  /  module 
differed  across  languages,  but  that  all  were  much  less  than  the  DoD's  proposed  standard  of  200 
SLOC  /  module.  In  his  suggestions  for  future  research,  he  notes  that  "More  research  is 
necessary  to  derive  and  validate  upper  and  lower  bounds  for  module  size.  Module  size  lower 
bounds,  or  some  equivalent  metric  such  as  coupling,  have  been  neglected;  however  they  are 
just  as  significant  as  upper  bounds.  With  just  a  module  size  upper  bound,  there  is  no  way  to 
dissuade  the  implementation  of  excessively  small  modules,  which  in  turn  introduce  intermodule 
complexity,  complicate  software  integration  testing,  and  increase  computer  resource  overhead." 
[1984,  p.  331]  Boydston,  in  his  analysis  of  programmer  effort,  noted  that  "...as  a  project  gets 
larger,  the  additional  complexity  of  larger  modules  has  to  be  balanced  by  the  increasing 
complexity  of  information  transfer  between  modules."   [Boydston,  1984  p.  159].  Card,  Page, 
and  McGarry  tested  the  impact  of  module  size  and  strength  (singleness  of  purpose)  on 
programming  effort  [Card,  Page  and  McGarry,  1985].  In  their  basic  analysis,  they  found  that 
effort  decreased  as  the  size  of  the  module  increased.  However,  they  also  noted  that  effort 
decreased  as  strength  increased,  but  that  increases  in  strength  were  associated  with  decreases  in 


module  size.  Their  conclusion  was  that  nothing  definitive  could  be  stated  about  the  impact  of 
module  size. 

An,  Gustafson,  and  Melton,  in  analyzing  change  data  from  two  releases  of  UNIX,  found  that 
the  average  size  of  unchanged  modules  (417  lines  of  C)  was  larger  than  that  of  changed 
modules  (279  lines  of  C)  [An,  Gustafson  and  Melton,  1987].  Unfortunately,  the  authors  do 
not  provide  any  analysis  to  determine  if  this  difference  is  statistically  significant.  Most 
recentiy,  Lind  and  Vairavan  analyzed  the  change  rate  (number  of  changes  per  100  lines  of 
code)  versus  a  lines  of  code-based  categorical  variable  [Lind  and  Vairavan,  1989].  They 
found  that  minimum  values  of  change  density  occurred  in  the  middle  of  their  ranges, 
suggesting  that  modules  that  were  both  too  large  and  too  small  increased  the  amount  of  change 
density.  TTiey  further  suggest  that,  for  the  Pascal  and  Fortran  programming  languages,  the 
optimum  value  might  be  between  1(X)  and  150  SLOC.  This  is  in  contrast  to  some  suggestions 
from  work  in  Japan,  where,  for  example,  at  Toshiba  the  management  heuristic  is  no  more  than 
50  lines  per  module  [Matsumura  et  al.,  1987]. 

The  results  of  these  previous  studies  can  be  summarized  as  follows.  Researchers  looking  for 
unidirectional  results  {i.e.,  that  either  smaller  modules  or  larger  modules  were  better)  have 
found  either  no  or  contradictory  results.  Other  researchers  have  suggested  that  a  U-shaped 
function  exists,  that  is,  both  modules  that  are  too  small  and  modules  that  are  too  large  are 
problematic.  In  the  case  of  many  small  modules,  the  number  of  intermodule  interfaces  is 
increased,  and  interfaces  have  been  shown  to  be  among  the  most  problematic  components  of 
programs  [Basili  and  Perricone,  1984].  In  the  case  of  a  few  very  large  modules,  these 
modules  are  less  Hkely  to  be  devoted  to  a  single  purpose  and  may  be  assumed  to  be  more 
complex,  as  both  of  these  factors  having  been  linked  with  larger  numbers  of  errors  and 
therefore  higher  maintenance  costs  [Card,  Page  and  McGarry,  1985]  [Vessey  and  Weber, 
1983], 

However,  the  researchers  who  have  suggested  the  U-shaped  curve  hypothesis  either  provide 
no  or  very  limited  {i.e.,  categorical)  data  linking  size  and  cost .  They  also,  in  general,  do  not 
provide  a  methodology  for  determining  the  optimum  program  size.^   Finally,  recent  research 
on  software  complexity  metrics  has  suggested  that,  for  large  systems,  modularity  is  most 
appropriately  measured  at  multiple  levels  of  program  organization  [Zweig,  1989].  This  is 


^Boydston  does  extrapolate  from  his  dataset  to  suggest  a  specific  square  root  relationship  between  number  of 
new  lines  of  code  and  number  of  modules  for  his  Assembler  and  PLS  language  data  [Boydston,  1984]. 


because,  as  will  be  explained  in  greater  detail  in  Section  III  below,  the  effects  of  breaking  an 
application  into  modules  of  an  appropriate  size  are  believed  to  be  distinct  from  those  of 
breaking  those  modules  into  their  component  subprograms  or  procedures  [Zweig,  1989]. 
Therefore,  a  general  research  question  to  be  addressed  is: 

Research  question  2:  Do  software  maintenance  costs  depend  significantly  upon  degree  of 
modularity,  measured  at  multiple  levels,  with  costs  rising  for  applications  that  are  either  under 
or  over  modularized? 

Structure 

An  excellent  review  of  the  empirical  research  on  structured  programming  is  provided  by 
[Vessey  and  Weber,  1984].  Therefore,  this  section  will  only  briefly  summarize  the  arguments 
presented  there.  Structured  programming  is  a  design  approach  that  limits  programming 
constructs  to  three  basic  control  structures.  Because  these  structures  are  often  difficult  to 
adhere  to  using  the  GOTO  syntax  found  in  older  programming  languages,  this  approach  is 
sometimes  colloquially  referred  to  as  "GOTO-less  programming".  Vessey  and  Weber  note 
that,  while  few  negative  results  have  been  found,  absence  of  significant  results  is  as  frequent  as 
a  finding  of  positive  results,  a  development  that  they  attribute,  in  part,  to  the  fact  that 
researchers  have  not  adequately  controlled  for  other  factors.  They  note  the  difficulty  of 
achieving  such  control,  particularly  in  non-laboratory,  real  world  settings.  Therefore,  the 
question  of  a  positive  impact  of  structure  on  maintenance  costs  is  still  unanswered,  and 
requires  further  empirical  support.  This  suggests  the  following  research  question: 

Research  question  3:  Do  software  maintenance  costs  depend  significantly  upon  the  degree  of 
control  structure  complexity,  with  costs  rising  with  increases  in  complexity? 

In  the  following  section  we  describe  our  approach  to  answering  these  research  questions. 
III.  Research  Approach 

In  attempting  to  answer  the  research  questions  posed  above,  we  needed  to  test  the  impact  of 
complexity  on  real-world  systems,  and  to  attempt  to  control  for  other  factors  that  may  have  an 
impact  on  labor  productivity,  since  labor  costs  are  the  single  largest  cost  component  in 
commercial  software  maintenance.  For  this  purpose  we  began  with  the  data  and  model 
developed  in  our  previous  research  in  software  maintenance  productivity.  The  data  collection 
procedures  and  model  development  are  described  in  detail  in  [Kemerer,  1987]  and  [BDK, 
1991],  and  will  only  be  summarized  here. 


The  Research  Site 
Data  were  collected  at  a  major  regional  bank  with  a  large  investment  in  computer  software.  The 
bank's  systems  contain  over  10,000  programs,  totalling  over  20  million  lines  of  code.  Almost 
all  of  them  are  COBOL  programs  running  on  large  IBM  mainframe  computers.  The  programs 
are  organized  into  application  systems  (e.g.  Demand  Deposits)  of  typically  100  -  300  programs 
each.  Some  of  the  bank's  major  application  systems  were  written  in  the  mid- 1 970' s,  and  are 
generally  acknowledged  to  be  more  poorly  designed  and  harder  to  maintain  than  more  recently 
written  software. 

The  software  environment  in  which  we  are  conducting  our  research  is  believed  to  be  a  quite 
typical  commercial  data  processing  environment.  The  empirically  based  results  of  the  research 
should,  therefore  be  highly  generalizable  to  other  commercial  environments.  The  projects 
analyzed  were  homogeneous  in  that  they  all  affected  COBOL  systems,  so  our  results  are  not 
confounded  by  the  effects  of  multiple  programming  languages. 

We  analyzed  65  software  maintenance  projects  from  17  major  application  systems  (see  Table  2 
in  Section  IV).  These  projects  were  carried  out  between  1985  and  1987.  An  average  project 
took  about  about  a  thousand  hours  (at  an  accounting  cost  of  $40  per  hour)  and  changed  or 
created  about  five  thousand  source  lines  of  code. 

Modeling  maintenance  productivity 

Our  major  goal  in  this  study  is  to  evaluate  the  impact  of  software  complexity  on  maintenance 
labor  productivity.  In  order  to  do  so,  however,  we  must  control  for  the  effects  of  other 
factors,  such  as  task  magnitude  and  the  skill  of  the  developers,  that  also  affect  the  developer 
hours  required  on  a  project  [Gremillion,  1984].  Excluding  task  size  or  other  relevant  factors 
would  result  in  a  mis-specification  of  the  model  and  incorrect  inferences  about  the  impact  of 
software  complexity  on  costs.  For  example,  a  large  maintenance  project  dealing  with  an 
application  system  of  low  complexity  may  require  more  hours  than  another  project  meant  to 
make  a  small  modification  to  a  system  of  higher  complexity.  A  failure  to  control  for  the 
different  task  sizes  could  lead  us  to  the  unjustified  conclusion  that  higher  software  complexity 
will  result  in  lower  costs. 

Figure  1  presents  a  measurement  model  of  the  maintenance  function  developed  in  [BDK, 
1991].  Software  maintenance  is  viewed  as  a  production  process  whose  inputs  are  labor  and 
computing  resources  and  whose  output  is  the  modified  system.  Since  labor  hours  are 


considerably  more  expensive  than  computer  resources,  and  as  there  are  limited  substitution 
possibiUties  between  the  two,  we  focus  upon  labor  hours  as  the  major  expense  incurred  in 
software  maintenance.  The  productivity  of  this  process  depends  upon  a  number  of 
environmental  variables,  including  the  skill  and  experience  of  the  developers,  and  the  software 
tools  available  to  the  developers  [Boehm,  1987]. 


Activity 

Out  nut  Measure 

Input  Measure 

Analysis/Design 

Function  Points 

Total  Labor  Hours 

Coding/Testing 

Source  Lines  of  Code 

Figure  1:  Measurement  Model 

We  write  labor  hours  required  as  a  function  of  the  task  requirement  and  environmental  factors: 

Labor  Hours  =  f(Task  magnitude  and  complexity.  Developer  skill.  Developer 
application  experience.  Working  environment.  Product  quality.  Software  tools, 
Software  complexity) 

The  unit  of  analysis  for  this  model  is  the  project  as  defined  by  the  bank.  Each  maintenance 
project  has  its  own  task  requirements  and  its  own  budget.  For  each  such  project,  the  following 
factors  are  considered:'*. 

a)  Task  magnitude:  The  output  of  the  software  maintenance  process  is  the  modified 
system.  The  amount  of  change  will  naturally  have  a  major  effect  on  the  amount  of 
work  required 

b)  Task  complexity:  Other  things  being  equal,  some  maintenance  tasks  are  simply  more 
difficult  and  demanding  than  others.  This  may  be  because  they  demand  a  more 
sophisticated  level  of  programming.  It  may  be  because  the  task  specification  includes 
more  stringent  reliability  requirements.  In  either  case,  such  a  task  may  require  more 
maintenance  resources. 

c)  Developer  skill:  Previous  research  has  found  large  differences  in  productivity 
between  top  rated  developers  and  poorer  ones. 

d)  Developer  application  experience:  Even  a  good  develof)er  is  at  a  disadvantage  when 
faced  with  an  unfamiliar  system,  as  time  must  be  expended  in  comprehending  the 
software  and  becoming  famiUar  with  it. 


^The  interested  reader  is  refeired  to  [BDK,  1990]  for  references  to  each  of  these  factors. 


e)  Working  environment:  There  is  evidence  that  fast-turnaround  maintenance 
environments  enhance  developer  productivity. 

f)  Product  quality:  It  has  been  suggested  that  doing  a  careful  job  of  error-free 
programming  will  cost  more  than  a  rushed  job  would  cost,  although  its  benefits  will  be 
realized  in  the  long  term.  But,  there  are  those  who  believe  that  careful  and  systematic 
programming  may  not  take  any  longer,  some  even  arguing  that  it  should  be  less 
expensive. 

g)  Software  tools:  Many  commercially  available  products  have  been  designed  to 
increase  developer  productivity.  To  the  extent  that  they  do  so,  they  will  have  noticeable 
beneficial  effects  upon  maintenance  costs. 

h)  Software  complexity:  We  are  primarily  concerned  here  with  the  impact  of  this  factor 
upon  maintenance  costs.  Any  practical  cost  estimation  model,  however,  must  consider 
and  control  for  the  effects  of  other  factors  such  as  those  discussed  above. 


This  model  of  software  maintenance  costs  with  the  first  seven  factors  (i.e.,  not  including  "h) 
Software  complexity")  has  already  been  tested  at  the  research  site  [BDK,  1991].  The  model 
was  explicidy  designed  to  allow  the  introduction  of  new  factors,  and  by  introducing  software 
complexity  we  can  confirm  its  robustness.  Of  more  immediate  interest  is  that  we  can  test  the 
marginal  impact  of  software  complexity  upon  maintenance  costs.  And  we  can  compute  the 
actual  estimated  magnitude  of  the  cost  impact  of  complexity,  so  as  to  determine  the  extent  to 
which  the  effect  is  of  managerial  interest. 

Definitions 

The  following  definitions  will  be  used  throughout  the  rest  of  this  paper: 

*Module:  A  named,  separately  compilable  file  containing  COBOL  source  code.  A  module 
will  typically,  though  not  necessarily,  perform  a  single  logical  task,  or  set  of  tasks.  All  the 
modules  counted  in  the  analysis  were  of  this  type.  Modules  containing  COBOL  source  code 
but  not  the  headers  which  allow  it  to  be  run  on  its  own  (e.g.,  INCLUDE  modules  and  COPY 
files)  were  not  included. 

*Paragraph:  The  smallest  addressable  unit  of  a  COBOL  program.  A  sequence  of  COBOL 
executable  statements  preceded  by  an  address/identification  label.  This  construct  is  not 
precisely  paralleled  in  other  high  level  languages. 

*Procedure:  The  range  of  a  PERFORM  statement.  For  example,  if  paragraphs  are  labelled 
sequentially,  the  statement  PERFORM  D  THRU  G  invokes  the  procedure  consisting  of 
paragraphs  D,  E,  F,  G  and  the  paragraphs  invoked  by  these  paragraphs. 

*Component:  The  union  of  two  or  more  overlapping  procedures,  (e.g.,  PERFORM  D  THRU 
G  and  PERFORM  E  THRU  J  will  have  at  least  E,  F,  and  G  in  common.)  Measurement  of 
components  prevents  possible  double  counting  [Spratt  and  McQuilken,  1987].  Such  overlaps 
are  relatively  rare,  however,  with  the  result  that  components  and  procedures  behave  almost 
identically  for  all  statistical  purposes. 


*  Application  System:  A  set  of  modules  assigned  a  common  name  by  the  bank,  typically 
performing  a  coherent  set  of  tasks  in  support  of  a  given  department,  and  maintained  by  a  single 
team.  References  to  this  term  refer  only  to  the  source  code,  not  to  the  JCL  and  other  material 
associated  with  it.  'Application'  or  'system',  if  used  separately,  mean  the  same  thing. 

Software  Complexity  Metrics 

A  number  of  steps  must  be  taken  before  it  can  be  determined  whether  reductions  in  software 
maintenance  costs  can  be  achieved  by  monitoring  and  controlling  software  complexity.  First, 
we  must  identify  appropriate  metrics  with  which  to  measure  software  complexity.  Having 
identified  such  measures,  we  can  then  attempt  to  establish  that  their  effects  are  managerially 
important  ~  that  they  do  in  fact  have  a  large  enough  effect  upon  software  costs  to  justify 
possibly  significant  expenditures  by  those  wishing  to  control  them. 

The  first  step  was  accomplished  in  an  earlier  study  at  the  same  research  site  [Banker,  Datar  and 
Zweig,  1989]^.  We  analyzed  over  five  thousand  appHcation  programs  in  order  to  develop  a 
basis  for  selecting  among  dozens  of  candidate  software  metrics  which  the  research  literature 
has  suggested.  Consistent  with  some  recent  research,  our  analysis  suggested  that  the  metrics 
which  we  analyzed  could  be  classified  into  three  major  groups,  measuring  three  distinct 
dimensions  of  software  complexity:  measures  of  module  size;  measures  of  procedure 
modularity;  and  measures  of  the  complexity  of  a  module's  control  structure  [BDZ,  1989] 
[Harrison  era/.,  1982]  [Munson  and  Khoshgoftaar,  1989]  [Rombach,  1987].  That  research 
also  identified  representative  metrics  ft"om  each  group  which  could  be  exp>ected  to  be 
Orthogonal  to  each  other  [BDZ,  1989]. 

In  the  current  study  we  undertake  the  second  step  required  to  validate  the  practical  use  of 
software  complexity  metrics  by  assessing  their  effect  upon  maintenance  costs.  We  used  a 
commercial  static  code  analyzer  to  compute  metrics  from  each  of  the  groups  of  metrics 
identified  earlier.  Three  software  complexity  metrics,  representing  each  of  the  previously 
identified  dimensions,  were  used  in  this  study.  Choice  of  these  three  metrics  was  based  upon 
the  ease  with  which  they  could  be  understood  by  software  maintenance  management  and  the 
ease  of  their  collection.  Given  the  typical  high  levels  of  cortelation  among  complexity  metric 
groups  [Zweig,  1989]  [Munson  and  Khoshgoftaar,  1989],  this  approach  has  been 


^Hereafter  referenced  as  "[BDZ,  1989]". 
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recommended  by  previous  research  [Shepperd,  1988]^.  Consistent  with  previous  research, 
we  used  module  length,  in  executable  statements  (STMTS)  for  the  first  metric,  a  measure  of 
size^.  The  effect  of  this  complexity  metric  will  depend  upon  the  application  systems  being 
analyzed.  Module  for  module,  larger  modules  will  be  more  difficult  to  understand  and  modify 
than  small  ones,  and  maintenance  costs  will  be  expected  to  increase  with  module  size. 
However,  a  system  can  be  composed  of  too  many  small  modules  as  easily  as  too  few  large 
ones.  If  modules  are  too  small,  a  maintenance  project  will  spread  out  over  many  modules  with 
the  attendant  interface  problems  and  therefore  maintenance  costs  could  actually  decrease  as 
module  size  increases. 

For  the  second  metric,  to  measure  procedure  modularity,  we  computed  the  average  size  of  a 
module's  procedures  (STMTCOMP)^.  The  same  argument  concerning  the  effect  of  module 
size  applies  here.  And,  if  modules  are  broken  into  too  many  small  components,  then  an 
increase  in  average  component  size  will  be  associated  with  a  decrease  in  maintenance  costs. 
There  is  an  almost  universal  tendency  to  associate  large  comf)onent  size  with  poor  modularity, 
but  intuitively,  neither  extreme  is  likely  to  be  effective. 

A  third  dimension  of  software  complexity  was  the  complexity  of  the  module's  control 
structure.  The  initial  candidate  metric  chosen  for  this  dimension  was  the  proportion  of  the 
executable  statements  which  were  GOTO  statements  (GOTOSTMT)  We  selected  a  control 
structure  metric  which  was  normalized  for  module  size,  so  that  it  would  not  be  confounded 
with  STMTS.  This  metric  is  also  a  measure  of  module  decomposabUity,  as  the  degree  to 
which  a  module  can  be  decomposed  into  small  and  simple  components  depends  directiy  upon 
the  incidence  of  branching  within  the  module.  Highly  decomposable  modules  (modules  with 
low  values  of  GOTOSTMT)  should  be  less  costly  to  maintain,  since  a  developer  can  deal  with 
manageable  portions  of  the  module  in  relative  isolation. 

The  density  of  GOTO  statements  (GOTOSTMT),  like  other  candidate  control  metrics  we 
examined,  is  a  measure  of  decomposabUity  —  each  GOTO  command  makes  a  module  more 
difficult  to  understand  by  forcing  a  programmer  to  consider  multiple  portions  of  the  module 


^However,  in  order  to  test  the  sensitivity  of  our  results  to  choices  of  alternative  meuics,  the  model  described 
below  was  re-eslimaled  using  other  metrics.  No  significant  changes  in  the  results  were  found  due  to  specific 
metric  choice. 

'For  these  data  this  metric  is  highly  correlated  (Pearson  correlation  coefficient  >  .92)  with  other  size  meuics, 
such  as  physical  lines  of  code,  and  Halstead  Length,  Volume,  and  Effort  [Zweig,  1989]. 
°This  metric  was  found  to  be  uncorrelated  with  STMTS  (coefficient  =  .10). 
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simultaneously  --  but  it  does  not  distinguish  between  more  and  less  serious  structure 
violations.  A  branch  to  the  end  of  the  current  paragraph,  for  example,  is  unlikely  to  make  that 
paragraph  much  more  difficult  to  comprehend,  while  a  branch  to  a  different  section  of  the 
module  may  [Vessey,  1986].  However,  none  of  the  existing  structure  metrics  we  examined 
clearly  differentiate  between  the  two  cases. 

The  modules  we  analyzed  have  a  large  incidence  of  GOTO  statements  (approximately  seven  per 
hundred  executable  statements)  but  if  only  a  relatively  small  proportion  of  these  are  seriously 
affecting  maintainability,  then  the  GOTOSTMT  metric  may  be  too  noisy  a  measure  of  control 
structure  complexity.  Empirically,  over  half  of  the  GOTOs  in  these  programs  (19  GOTOs  out 
of  31  in  the  average  module)  are  used  to  skip  to  the  beginning  or  end  of  the  current  paragraph. 
Such  branches  would  not  be  expected  to  contribute  noticeably  to  the  difficulty  of  understanding 
a  module  (in  most  high  level  languages  other  than  COBOL  they  would  probably  not  be 
implemented  by  GOTO  statements)  and  a  metric  such  as  GOTOSTMT  which  does  not 
distinguish  between  these  and  the  less  benign  40%  of  the  branch  commands  will  be 
understandably  imperfect 

To  avoid  this  problem,  a  modified  metric  was  computed  (GOTOFAR)  which  is  the  density  of 
the  non-benign  GOTO  statements  i.e.,  the  40%  of  the  GOTO  statements  which  extend  outside 
the  boundaries  of  the  paragraph  and  which  can  be  expected  to  seriously  impair  the 
maintainability  of  the  software.  Since  the  automated  static  code  analyzer  was  not  able  to 
compute  this  metric,  it  was  computed  manually.  Due  to  the  large  amount  of  time  this 
computation  required,  this  metric  was  not  computed  for  all  the  modules  analyzed,  but  for  a 
random  sample  of  approximately  fifty  modules  per  apphcation  system  (about  1500  modules  in 
total,  or  approximately  30%  of  all  modules)^. 

Research  Hypotheses 

Based  on  the  above  research  approach,  we  propose  four  specific  research  hypotheses  based  on 
the  initial  research  questions  that  can  be  empirically  tested: 

Hypothesis  1 :  Controlling  for  other  factors  known  to  affect  software  maintenance  costs, 
software  maintenance  productivity  increases  significantly  with  increases  in  software 
complexity,  as  measured  by  STMTS,  STMTCOMP  and  GOTOFAR. 


"a  later  sensitivity  analysis  regression  using  G0T0STT4T  instead  of  GOTOFAR  lends  credence  to  our  belief 
that  the  excluded  branch  commands  represent  a  noise  factor:  The  estimated  effect  of  GOTOSTMT  had  the  same 
relative  magnitude  as  that  of  GOTOFAR,  but  the  standard  error  of  the  coefficient  was  four  times  as  large. 
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Hypothesis  2:  Software  maintenance  costs  will  depend  significantly  upon  average  module  size 
as  measured  by  STMTS,  with  costs  rising  for  applications  whose  average  module  size  is  either 
too  large  or  too  small. 

Hypothesis  3:  Software  maintenance  costs  will  depend  significantly  upon  average  procedure 
size  as  measured  by  STMTCOMP,  with  costs  rising  for  applications  whose  average  procedure 
size  is  either  too  large  or  too  small. 

Hypothesis  4:  Software  maintenance  costs  will  depend  significantly  upon  the  density  of 
branching  as  measured  by  GOTOFAR,  with  costs  rising  with  increases  in  the  incidence  of 
branching. 

IV.  Model  and  Results 

Factors  Affecting  Maintenance  Costs 

We  the  focus  of  the  current  research  is  in  assessing  the  effect  of  software  complexity  upon 
maintenance  costs,  it  is  necessary  to  control  for  other  factors  known  to  affect  these  costs.  The 
most  significant  of  these,  of  course,  is  the  magnitude  of  the  maintenance  task.  To  control  for 
this,  and  for  other  factors  known  to  affect  costs,  we  began  with  a  previously  developed  model 
of  software  maintenance  costs  [BDK,  1991].  The  magnitude  of  the  maintenance  task  is 
measured  by  both  the  number  of  Function  Points  and  the  number  of  source  lines  of  code 
(SLOC)  added  or  changed  [Albrecht  and  Gaffney,  1983]  [Boehm,  1987]. 

Other  factors,  shown  to  be  significant  in  affecting  project  costs  included: 

*SKILL:  The  percent  of  developer  hours  billed  to  the  most  highly  skilled  (by  formal 
management  evaluation)  developers.  This  variable  is  quite  distinct  firom  the  following 
one,  which  depended  upon  the  developers'  experience  with  a  specific  application 
system.  [BDK,  1991] 

*LOWEXPER:  The  extensive  use  (over  90%  of  hours  billed  to  the  project)  of 
developers  lacking  experience  with  the  application  being  modified.  (A  binary  variable.) 
[BDK,  1991] 

The  values  of  these  variables  depended  upon  the  number  of  hours  billed  to  each  project.  This 
information  was  obtained  from  the  project  billing  files.  [BDK,  1991] 

*METHOD:  The  use  of  a  structured  design  methodology.  (A  binary  variable.)  This  is 
expected  to  have  an  adverse  effect  upon  single-project  productivity,  although  it  is  meant 
to  reduce  costs  in  the  long  run.  [BDK,  1991] 

♦RESPONSE:  The  availability  of  a  fast- turnaround  programming  environment.  (A 
binary  variable.)  [BDK,  1991] 

The  values  of  these  binary  variables  were  obtained  from  the  project  managers. 
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♦QUALITY:  A  measure  (on  a  three-point  scale  of  low/medium/high  quality)  of  the 
degree  to  which  the  completion  of  the  project  was  followed  by  an  increase  in  the 
number  of  operational  errors.  This  measure  was  based  upon  information  obtained  from 
the  site's  error  logs.  [BDK,  1991] 

In  a  manner  consistent  with  the  software  productivity  literature  we  model  the  effects  of  these 
factors  to  be  proportional,  rather  than  absolute,  so  they  are  weighted  by  program  size  [Boehm, 
1981]  [Albrecht  and  Gaffney,  1983].  These  explanatory  factors  are  weighted  by  a  measure  of 
project  size,  either  by  FP  or  by  SLOC,  depending  on  whether  they  are  thought  to  be  associated 
more  strongly  with  the  analysis  phase  or  with  the  coding  phase  of  the  project.  ^^ 

In  testing  the  various  complexity  metrics,  we  shall  be  interested  in  their  impact  upon 
maintenance  costs  controlling  for  these  other  factors.  To  do  so,  we  shall  estimate  the  following 
model: 

HOURS  =  po  +  Pl*FP  +  P2*SL0C  +  P3*FP*FP+  p4*SL0C*SL0C  +  P5*FP*SLOC  + 
p6*FP*L0WEXPER  +  p7*FP*SKILL  +  p8*FP*METH0D  + 
p9*SL0C*QUALITY  +  Pio*SLOC*RESPONSE  + 
pll*SLOC*STMTS  +  Pi2*SL0C*STMTC0MP  +  pi3*SL0C*G0T0FAR  +  e 

This  model,  without  the  three  complexity  terms  (the  terms  associated  with  parameters  pn 
through  Pi3),  has  been  previously  validated  at  the  research  site  [BDK,  1991].  In  this  model, 
project  costs  (measured  in  developer  HOURS)  are  primarily  a  function  of  project  size, 
measured  in  function  points  (FP)  and  in  source  lines  of  code  (SLOC).  The  number  of  hours 
was  obtained  from  the  site's  billing  files.  The  size  measures  were  computed  by  the 
development  staff  after  the  projects  were  complete.  In  order  to  model  the  known  nonlinearity 
of  development  costs  with  respect  to  project  size,  we  include  not  only  FP  and  SLOC,  but  also 
their  second-order  terms.  We  expect  this  to  result  in  a  high  degree  of  multicollinearity  among 
the  size  variables  which  will  make  the  interpretation  of  their  coefficients  difficult  [Banker  and 
Kemerer,  1989].  Those  coefficients,  however,  are  of  no  concern  to  us  for  examining  the 
current  research  hypotheses  relating  to  the  impact  of  complexity.  Table  2  presents  the 
summary  statistics  for  this  dataset. 


^^it  should  be  noted  that  any  coUinearity  which  may  exist  between  the  weighted  complexity  metrics  and  other 
independent  variables  which  have  been  weighted  by  SLOC  will  cause  us  to  underestimate  the  significance  of  the 
complexity  metric  variable.  Therefore,  the  analysis  presented  below  is  a  conservative  test 
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fable  2:    Maintenance  Project  Summary  Statistics* 

VARIABLE 

MEAN 

Stand.  Deviation 

MIN 

MAX 

HOURS 

937 

718 

130 

3342 

FP 

118 

126 

8 

616 

SLOC 

5416 

7230 

50 

31060 

LOWEXPER 

.68 

.48 

0 

1 

SKILL 

63 

34 

0 

100 

METHOD 

.29 

.46 

0 

1 

QUALITY 

2.08 

.53 

1 

3 

RESPONSE 

.63 

.49 

0 

1 

STMTS* 

681 

164 

382 

1104 

STMTCOMP* 

43 

18 

13 

87 

GOTOFAR* 

0.026 

0.02 

0.0 

0.07 

*The  values  given  for  the  complexity  metrics  are  averages  over  multiple  programs. 
Analysis  and  Statistical  Results 

One  additional  extension  to  the  original  model  was  made  before  the  research  hypotheses  related 
to  complexity  could  be  tested.  We  expect  the  relationship  between  maintenance  costs  and 
procedure  size  to  be  U-shaped,  rather  than  monotonic,  with  costs  being  lowest  for  some 
optimal  size  and  higher  for  larger  or  smaller  sizes.  It  is  inappropriate,  then,  to  use  the 
STMTCOMP  metric  directiy.  Rather,  developer  productivity  should  decline  as  this  metric 
moves  away  from  its  optimum  value.  To  model  this  effect  we  will  add  the  squared  term, 
STMTC0MP2  ,  and  compute  the  joint  effect  of  STMTCOMP  and  STMTC0Mp2  upon  costs. 
Our  revised  model,  then,  is: 

HOURS  =  po  +  Pl*FP  +  p2*SL0C  +  p3*FP*FP+  P4*SL0C*SL0C  +  P5*FP*SL0C  + 
P6*FP*L0WEXPER  +  P7*FP*SKILL  +  p8*FP*METH0D  + 
P9*SLOC*QUALITY  +  Pio*SLOC*RESPONSE  + 
pll*SLOC*STMTS  +  pi2*SL0C*STMTC0MP  + 
P13*SL0C*STMTC0MP2  +  pi4*SL0C*G0T0FAR  +  e 

Using  Ordinary  Least  Squares  regression,  we  estimated  this  model,  which  includes  our 
measures  of  module,  component  size,  and  branching  density.  The  results  of  this  regression  are 
presented  in  Table  3,  with  the  complexity  metric  variables  in  bold. 


15 


Table  3:     Regression  Results 

VARIABLE 

MEAN 

COEFHCIENT 

t 

HOURS 

937 

n/a 

n/a 

Intercept 

n/a 

333 

5.06 

FP 

118 

3.21 

2.09 

SLOC 

5416 

0.34 

6.57 

FP*FP 

29.6K 

0.008 

2.80 

SLOC*SLOC 

81M 

-3.1E-6 

-2.99 

FP*SLOC 

I.IM 

-.0001 

-1.34 

FP*SKILL 

8009 

-.049 

-3.50 

FP*LOWEXPER 

81 

0.105 

0.16 

FP*METHOD 

44 

1.78 

3.07 

SLOC*QUALITY 

UK 

0.027 

2.76 

SLOC*RESPONSE 

4341 

-0.019 

-1.17 

SLOC*STMTS 

3.7M 

-1.2E-4 

-3.33 

SLOC* 
STMTCOMP 

241K 

-.011 

-4.90 

SLOC* 
STMTC0MP2 

12.0M 

.00012 

5.40 

SLOC*GOTOFAR 

152 

1.307 

3.22 

Fi4.50  =  31.07.  r2=89.  Adjusted  r2=  87. 
Although  not  all  of  the  regression  variables  are  highly  significant  for  this  sample,  no  attempt  is 
being  made  to  eliminate  variables  so  as  to  achieve  a  more  parsimonious  fit.  We  are  only 
interested  in  assessing  the  marginal  impact  of  adding  the  complexity  metrics.  Testing  for  the 
marginal  significance  of  the  complexity  metrics  (those  below  the  double  line,  Pi  1-Pl4)  we  get: 

F4,50  =12.01    (p=.0001). 
Testing  for  the  joint  significance  of  the  two  component  size  terms  (Pl2-Pl3)  we  get: 

F2,50  =12.51    (p=.0001). 

The  addition  of  the  second-order  complexity  term,  STMTC0Mp2,  to  the  original  model, 
allows  us  to  compute  the  minimum  points  of  the  U-shaped  curve  through  interpretation  of  the 
estimates  of  the  coefficients  of  the  model  generated  using  OLS.    (Since  it  is  a  quadratic 
relationship,  the  stationary  point  may  be  computed  by  dividing  the  coefficient  of  the  linear  term 
by  twice  that  of  the  quadratic  term.) 
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At  this  site,  the  minimum-cost  component  size  was  computed  to  be  (0.01 1/(2*0.00012))  =  45 
executable  statements  per  component  (See  Table  3).  This  value  is  very  close  to  the  mean  (43) 
and  to  the  median  (40)  for  this  organization.  However,  individual  appUcations  vary  in  average 
procedure  size  fix)m  13  to  115  executable  statements!  ^^ 

As  an  additional  test  of  the  robustness  of  these  results,  after  determining  the  minimum  value  of 
45,  we  developed  a  linear  model  incorporating  two  linear  variables  representing  the  deviations 
below  and  deviations  above  the  optimum  value.  This  model  generated  similar  results  (r2=.90, 
adjusted  r2=.87,  Fh.so  =  31.15). 

Analogous  to  the  U-shaped  relationship  between  maintenance  costs  and  procedure  size,  there  is 
also  reason  to  expect  a  similar  U-shaped  relationship  between  maintenance  cost  and  module 
size.  An  additional  model  was  tested,  adding  a  second  order  term,  STMTS^.  However,  this 
relationship  was  not  supported  by  the  data  at  this  site,  as  the  second  order  term  was  found  to  be 
statistically  insignificant.  The  resulting  coefficients  showed  that  all  the  appUcation  systems 
examined  fell  on  the  downward-sloping  portion  of  the  computed  curve.  In  fact,  a  direct 
plotting  of  the  data  confirmed  that  the  relationship  was  downward-sloping  and  approximately 
linear  across  the  observed  range  of  the  data,  so  no  second-order  term  was  included  for  this 
complexity  metric  in  the  final  model. 

The  Belsley,  Kuh,  Welsch  test  of  multicoUinearity  [Belsley,  Kuh  and  Welsch,  1980]  did  not 
show  the  complexity  metrics  to  be  significandy  confounded  with  the  other  regression  variables, 
so  we  may  interpret  their  coefficients  with  relative  confidence.  We  also  detected  no  significant 
heteroskedasticity.  This  supports  our  decision  to  model  the  complexity  effects  in  our 
regression  as  proportional  ones,  rather  than  use  the  unweighted  metrics  alone. '^ 

Tests  of  the  Research  Hypotheses 

This  analysis  confirms  our  four  hypotheses  that  software  complexity  increases  maintenance 
costs: 


^  ^  As  is  often  the  case  in  this  type  of  estimation  [Banker  and  Kemerer,  1989]  there  was  a  high  degree  of 
multicoUinearity  between  the  linear  term  and  the  quadratic  term,  which  required  the  computation  of  the 
minimum  point  to  be  taken  with  caution.  Sensitivity  analysis,  using  different  minimum  points,  showed  the 
estimation  to  be  insensitive  to  moderate  variations  in  this  value. 

'^If  the  complexity  effects  were  not  proportional  to  project  magnitude,  our  use  of  the  weighted  metrics  would 
cause  our  model  to  overestimate  the  costs  of  large  projects,  resulting  in  residuals  negatively  correlated  with  size. 
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Hypothesis  1  was  the  general  hypothesis  that,  controlling  for  the  other  explanatory  factors, 
software  complexity  has  a  significant  Impact  upon  software  maintenance  costs.  This  is 
confirmed.  Recall 

P(Ho:  pll=Pl2=Pl3=Pl4=0)=0.0001  as  F4,50  =12.02. 

Hypothesis  2  was  that  maintenance  costs  would  be  significantiy  affected  by  module  size.  This 
is  confumed. 

P(Ho:  Pi  i=0)=0.001  as  tso  =  -3.33. 
We  also  tested  for  a  U-shaped  relationship  between  module  size  and  software  maintenance 
costs.  The  maintenance  costs  at  this  site  the  data  tended  to  be  linear  over  the  observed  range  of 
module  sizes,  controlling  for  other  factors.  It  should  be  noted  however,  that  while  these  data 
do  not  indicate  a  U-shaped  relationship,  they  are  not  necessarily  inconsistent  with  such  a 
hypothesis.  (The  data  can  be  seen  as  falling  on  the  downward  sloping  arm  of  this  U,  with  the 
possibility  that  had  sufficiently  large  modules  been  available,  that  costs  would  again  begin  to 
rise.) 

Hypothesis  3  was  that  maintenance  costs  would  be  significantly  affected  by  procedure  size. 
This  hypothesis  is  confirmed  by  an  F  test  on  the  joint  effect  of  the  two  procedure-size  terms. 
Recall 

P(Ho:  P 1 2=P  1 3=0)=0.0{)0 1  as  F2.  50  =  1 2.02. 
Again,  we  hypothesized  a  U-shaped  relationship  between  procedure  size  and  software 
maintenance  costs.  At  this  site  the  data  are  supportive  of  the  U-shaped  hypothesis,  with  actual 
application  systems  observed  to  fall  on  both  arms  of  the  U,  and  minimum  costs  observed  for  a 
procedure  size  of  approximately  45  executable  statements. 

Hypothesis  4  was  that  maintenance  costs  would  be  significantly  affected  by  the 
density  of  branch  instructions  within  the  modules.  This  is  confirmed. 
P(Ho:  Pi4=0)=0.002)  as  tso  =  3.22. 

V.  Software  Maintenance  Management  Results 

Through  the  above  analysis  we  have  estimated  the  effect  of  software  complexity  upon 
developer  productivity  in  a  maintenance  environment.  While  it  is  a  firmly  established  article  of 
conventional  wisdom  that  poor  programming  style  and  practices  increase  programming  costs, 
there  has  been  litde  empirical  evidence  to  support  this  notion.  As  a  result,  efforts  and 
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investments  meant  to  improve  programming  practices  have  had  to  be  undertaken  largely  on 
faith.  We  have  extended  an  existing  model  of  maintainer  productivity  and  used  it  to  confum 
the  significance  of  the  impact  of  software  complexity  upon  productivity.  We  used  a  model 
which  allowed  us  to  not  only  verify  this  significance,  but  also  to  estimate  the  magnitude  of  the 
effect.  The  existence  of  such  a  model  provides  managers  with  estimates  of  the  benefits  of 
improved  programming  practices  which  can  be  used  to  cost-justify  investments  designed  to 
improve  those  practices.  Based  upon  the  regression  estimates  in  Table  4,  the  effects  of  the 
metrics  for  projects  of  average  size  (about  5400  source  lines  of  code)  are  approximately 

•  0.6  hours  reduction  for  every  statement  added  to  average  module  size. 

•  15  hours  added  for  every  statement  deviation  from  an  optimum  average  component 

size  of  45. 

•  140  hours  added  for  every  1%  absolute  increase  in  the  proportion  of  statements 

which  are  non-benign  GOTO  statements. 

A  perhaps  more  informative  way  to  interpret  these  results  is  to  compute  the  percent  change  in 
average  project  costs  associated  with  metric  values  which  deviate  unfavorably  from  the 
research  site's  mean  values  by  one  standard  deviation.  The  advantage  of  this  approach  is  that 
we  know  we  are  comparing  the  more  complex  software  to  complexity  standards  observed  in 
practice  at  this  site,  rather  than  to  a  perhaps-arbitrary  ideal.  The  penalties  associated  with  these 
less  favored  complexity  scores  are: 

•  10%  of  total  costs  for  module  size. 

•  30%  of  total  costs  for  procedure  size^^. 

•  15%  of  total  costs  for  branching  density. 

Armed  with  these  quantified  impacts  of  complexity,  software  maintenance  managers  can  make 
informed  decisions  regarding  preferred  managerial  practice.  For  example,  one  type  of  decision 
that  could  by  aided  by  such  information  is  the  purchase  of  CASE  reengineering  tools.  A  great 
many  claims  are  made  for  such  tools;  improved  programming  practice  is  only  one  of  them. 
The  benefits  of  these  tools  have  also  generally  had  to  be  taken  on  faith.  Our  analysis, 
however,  indicates  that  the  magnitude  of  the  economic  impact  of  software  complexity  is 


l^There  is  an  asymmetry  here.  The  estimates  are  a  25%  penalty  for  appHcations  whose  procedures  are  1  SD 
smaller  than  average  and  a  35%  penalty  for  those  whose  procedures  are  1  SD  larger  than  average.  We  cannot 
statistically  reject  the  hypothesis  that  these  two  values  are  actually  equal. 
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sufficiently  great  that  many  organizations  may  b:  able  to  justify  the  purchase  and 
implementation  of  CASE  reengineering  tools  on  the  basis  of  these  estimated  benefits. 

More  generally,  a  common  belief  in  the  long-term  importance  of  good  programming  practice 
has  generally  not  been  powerful  enough  to  stand  in  the  way  of  expedience  when  "quick-and- 
dirty"  programming  has  been  perceived  to  be  needed  immediately.  An  awareness  of  the 
magnitude  of  the  cost  of  existing  software  cumplexity  can  combat  this  tendency.  The  cost  of 
correctable  complexity  at  this  research  site  amounts  to  several  million  dollars  per  year,  the 
legacy  of  the  practices  of  previous  years. 

Taken  together  these  ideas  show  how,  through  the  predictive  use  of  the  model  developed  here, 
managers  can  make  decisions  today  on  systems  design,  systems  development,  and  tool 
selection  and  purchase  that  depend  upon  system  values  that  will  affect  future  maintenance. 
This  can  be  a  valuable  addition  to  the  traditional  emphasis  on  current  on-time,  on-budget 
systems  development  in  that  it  allows  for  the  estimation  of  full  life-cycle  costs.  Given  the 
significant  percentages  of  systems  resources  devoted  to  maintenance,  improving  managers' 
ability  to  forecast  these  costs  will  allow  for  them  to  be  properly  weighted  in  current  decision- 
making. 

In  summary,  this  research  suggests  that  considerable  economic  benefits  can  be  expected  from 
adherence  to  appropriate  programming  practices.  In  particular,  such  aspects  of  modular 
programming,  such  as  the  maintenance  of  moderate  procedure  size,  and  the  limitation  of 
branching  between  procedures,  seems  to  have  great  benefits.  The  informed  use  of  tools  or 
techniques  which  encourage  such  practices  should  have  a  positive  net  benefit. 

VI.  Concluding  Remarks 

In  this  study  we  have  investigated  the  links  between  software  complexity  and  software 
maintenance  productivity.  On  the  basis  of  an  analysis  of  software  maintenance  projects  in  a 
commercial  application  environment  we  confirmed  that  software  maintenance  costs  rise 
significantly  as  software  complexity  increases.  In  this  study  software  maintenance  costs  were 
found  to  increase  with  increases  in  the  complexity  of  a  program's  components,  as  measured  by 
the  programs'  average  module  size,  average  procedure  size,  and  control  structure  complexity. 

Historically,  most  models  of  software  labor  productivity  have  focused  on  new  development, 
and  therefore  have  not  exphcidy  used  software  complexity  metrics.  Our  analysis  at  this  site 
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suggests  that,  after  controlling  for  other  factors  believed  to  affect  maintenance  costs,  high 
levels  of  software  complexity  account  for  approximately  30%  of  maintenance  costs,  or  about 
20%  of  total  Ufe-cycle  costs.  Therefore,  the  neglect  of  software  complexity  is  potentially  a 
serious  omission. 

The  results  presented  here  are  based  up>on  a  highly  detailed  analysis  of  programming  costs  at  a 
site  we  judge  to  be  very  typical  of  the  traditional  transaction  processing  environments  which 
account  for  such  a  considerable  p)ercentage  of  today's  software  maintenance  costs.  Based  upon 
this  analysis,  the  aggregate  cost  of  poor  programming  practice  for  industry  is  likely  to  be 
substantial.  ^'* 


^"^Helpful  comments  from  Tom  Malonc  and  Ron  Weber  are  gratefully  acknowledged. 
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